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ABSTRACT 

Iris recognition has drawn a lot of attention since the mid¬ 
twentieth century. Among all biometric features, iris is known 
to possess a rich set of features. Different features have been 
used to perform iris recognition in the past. In this paper, 
two powerful sets of features are introduced to be used for 
iris recognition; scattering transform-based features and tex¬ 
tural features. PCA is also applied on the extracted features 
to reduce the dimensionality of the feature vector while pre¬ 
serving most of the information of its initial value. Minimum 
distance classifier is used to perform template matching for 
each new test sample. The proposed scheme is tested on a 
well-known iris database, and showed promising results with 
the best accuracy rate of 99.2%. 


1. INTRODUCTION 

To personalize an experience or make an application more se¬ 
cure and less accessible to undesired people, we need to be 
able to distinguish a person from everyone else. It is done us¬ 
ing marks from the users to identify them and block unautho¬ 
rized access, or personalize it based on the trusted identity. To 
do so, many alternatives are on the table, such as keys, pass¬ 
words and cards. The most secure options so far, however, are 
biometric features which cannot be imitated by any other than 
the desired person himself. They are divided into behavioral 
features that the person can uniquely create or express, such 
as signatures, walking rhythm, and physiological character¬ 
istics that the person possesses, such as fingerprints and iris 
pattern. Many works revolve around identification and verifi¬ 
cation of such data including, but not limited to, fingerprints 
ifTl . palmprints |l2l-|j3, faces Q and iris patterns Q. 

Iris recognition systems are widely used for security ap¬ 
plications, since they contain a rich set of features and do not 
change significantly over time. They are also virtually im¬ 
possible to fake. One of the first modern algorithms for iris 
recognition was developed by John Daugman and used 2D 
Gabor wavelet transform 0. In a more recent work, Ku¬ 
mar 121 proposed to use a combination of Log-Gabor, Haar 
wavelet, DCT and FFT based features to achieve high accu¬ 
racy. In Is), Farouk proposed an scheme which uses elastic 
graph matching and Gabor wavelet. Each iris is represented 


as a labeled graph and a similarity function is defined to com¬ 
pare the two graphs. In 0, Belcher used region-based SIFT 
descriptor for iris recognition and achieved a relatively good 
performance. Pillai M proposed a unified framework based 
on random projections and sparse representations to achieve 
robust and accurate iris matching. The reader is referred to 
Q for a comprehensive survey of iris recognition. 

In most of iris recognition works, the iris region is first 
detected and the iris is mapped to a rectangular region in 
polar coordinate. Various iris segmentation algorithms are 
developed during the past few years HD. Foreground seg¬ 
mentation approaches can also be used for iris segmentation 
ll2l,llll. It is worth mentioning that no segmentation is per¬ 
formed to extract iris region from the eye image in our work, 
which makes it much easier to implement. In this work, two 
sets of features are extracted from iris images, one of them 
being the recently introduced set of scattering-transform fea¬ 
tures and the other one being that of textural features to cap¬ 
ture the texture information of irises. We believe that if these 
features are combined, they will provide a high discriminating 
power to conduct the recognition task. After the features are 
extracted, their dimensionality is reduced by applying PCA 
and then minimum distance classifier is used to recognize new 
iris images. Skipping the segmentation step makes our algo¬ 
rithm very fast and it can be easily implemented in electronic 
devices for real time applications using energy-efficient im¬ 
plementation and power management m. This algorithm is 
tested on the well-known IIT Delhi iris database, and a very 
high accuracy rate is achieved. Three sample iris images of 
the dataset used in this work are shown in Figure 1. 



Fig. 1. Three different iris images 

The rest of the paper is organized as follows. Section]^ 
describes the features which are used in this work. Section 
[^contains the explanation of the classification scheme. The 
results of our experiments and comparisons with other works 
are in Section]^ and the paper is concluded in Section]^ 


2. FEATURES 

Extracting good features and image descriptors is one of the 
most important steps in many computer vision and object 
recognition algorithms. As a result, many researchers have 
focused on designing useful features which can be used for 
a variety of object recognition and image classihcation tasks. 
A good feature should have some degree of invariance with 
respect to translation, slight rotation and deformation. There 
are many popular features and image descriptors which are 
being used today, including scale invariant feature transform 
(SIFT), histogram of oriented gradient (HOG), bag of words 
(Bow) csi-ini, etc. Geometrical features and sparsity- 
based features are also used for some biometric and medical 
applications in several works ifTSl - ll^ . A new algorithm 
for feature selection for small datasets is presented in ED- 
Recently, unsupervised feature learning algorithms have been 
in the spotlight, where the image is fed directly as the input 
to the deep neural network and the algorithm itself hnds the 
best set of features from the image. 

For iris recognition, various features have been used by 
several researchers, including wavelet-based features, PCA 
and FDA. In this paper, a combined set of two features is 
used: some derived from the scattering transform, and the rest 
from the textural information of iris patterns. These features 
are introduced in the following subsections in more detail. 

2.1. Scattering Features 

The scattering operator is a locally translation-invariant de¬ 
scriptor which is proposed by Stephane Mallat and has 
achieved state-of-the-art recognition accuracy in several com¬ 
puter vision and audio classihcation ll23]l problems. A 
scattering transform computes local image descriptors with a 
cascade of three operations: wavelet decompositions, com¬ 
plex modulus and a local averaging. The scattering co¬ 
efficients are similar to those of the SIFT descriptor, but 
they contain more high-frequency information than SIFT. 
As discussed in other image descriptors such as SIFT 
and multiscale Gabor textons can be obtained by averag¬ 
ing the amplitude of wavelet coefficients, calculated with 
directional wavelets. This averaging provide some sort of 
local translation invariance, but it also reduces the high- 
frequency information. Scattering transform recovers part of 
the high-frequency information lost by this averaging with 
co-occurrence coefficients having the similar invariance as 
those of the scattering transform. 

In most object recognition tasks, locally invariant features 
are preferred, since they provide robust representation of im¬ 
ages. They can be seen as the averaged value of gradient 
orientation. Using this averaging, some local deformation 
and translation will be tolerated. However, such process will 
reduce too much high-frequency information and therefore 
could greatly decrease discriminating capability. The scatter¬ 


ing features provide richer descriptors for complex structures 
such as corners and multiscale texture variations. 

The scattering operator is designed in a way that it pre¬ 
serves the locally invariance property of SIFT, but it also re¬ 
covers the lost high-frequency content of the images. Sup¬ 
pose we have a signal f{x). The hrst scattering coefficient is 
the average of the signal and can be obtained by convolving 
the signal with an averaging hlter (pj as f * (pj. The scatter¬ 
ing coefficients of the hrst layer can be obtained by applying 
wavelet transforms at different scales and orientations, and 
taking the magnitude and convolving it with a low-pass hlter 
(pj as shown below: 

( 1 ) 

where ji and Ai denote different scales and orientations and 
ji < J. Note that removing the complex phase of wavelet 
will make these coefficients insensitive to local translation. 

Now to recover the high-frequency information, which is 
eliminated from the wavelet coefficients of hrst layer by aver¬ 
aging, we can convolve \f * \ by another set of wavelet 

at scale j 2 < J, taking the absolute value of wavelet and tak¬ 
ing the average: 

\\f (2) 

One can show that \f * "(piiM \ * '4’hM negligible at scales 
where 2^^ < 2 ^^ Therefore the coefficients are calculated 
only for ji > j 2 - 

The convolution with (pj at the second layer removes high 
frequencies and yields second-order coefficients which are lo¬ 
cally invariant to translation. These high-frequency informa¬ 
tion can be restored again by hner scale wavelet coefficients 
in the next layers. To obtain the scattering coefficients at the 
fc-th layer, we have to perform the following procedure itera¬ 
tively k times: 

Sk,j{f{x))) = 11/ * V'ii.Ail * * V'jfe.Aj * </>J (3) 

Jk< --<h<3i<J, (Ai,...,Afc)Gr'' 

The output of scattering transform of the /c-th layer has a 
size of where p denotes the number of different orien¬ 

tations. In other words there are (^) transformed images at 
the output of the A:-th layer. 

The transformed images of the hrst and second layers of 
scattering transform for a sample iris image are shown in Fig¬ 
ures 2 and 3. These images are derived by applying bank 
of hlters of 5 different scales and 6 orientations. Scattering 
vector can be thought of as the cascade of convolution with 
wavelets, non-linear modulus and averaging operators which 
makes it very similar to the deep convolutional neural network 

M- 

To derive scattering features, the scattering-transformed 
images of all layers up to m are taken and the mean and 
variance of these images are calculated as scattering features 
which results in a vector fs of size main¬ 

stream applications, using two or three levels of scattering 
transform will be enough. 



Fig. 2. The images from the first layer of scattering transform 



Fig. 3. The images from the second layer of scattering trans¬ 
form 

2.2. Textural Features 

Textural, spectral and contextual features are the three funda¬ 
mental pattern elements in recognition. In human interpreta¬ 
tion of color photographs, textural features contain the spatial 
information of intensity variation in a single band ll25l . To 
mimic the human visual system, there are several features in¬ 
troduced to capture textural information of an image. Among 
them, Haralick features and local binary pattern (LBP) are 
two major groups of textural features. Haralick textural fea¬ 
tures are derived from the co-occurrence matrix of image. Lo¬ 
cal binary pattern are derived based on the relative compari¬ 
son between each pixel and its neighboring pixels. There are 
also various modified versions of LBP features such as tran¬ 
sition local binary patterns, direction-coded local binary pat¬ 
terns, volume local binary pattern (VLBP). In this work, Har¬ 
alick features are used to capture textural information of the 
image. To extract Haralick features, we first need to derive 
the co-occurrence matrix. The co-occurrence matrix mea¬ 
sures the distribution of co-occurring intensity values for a 
given offset. If we represent the image as a two-dimensional 
function which maps pairs of coordinates to the intensity val¬ 
ues, i.e., I : X X Y ^ G, where X = {1, 2, 3,..., N^} and 
Y = {1,2,3,..., Ny}, and G denotes the set of all possible 
grayscale levels. Then the co-occurrence matrix P of image 
/ with the offset (A^,, Ay) or PA^,Ay (l i/)can be defined as: 

Ny 

S{I{m,n) - i) 5{I{m + Ay^.nP Ay) - j) (4) 

m—l n—1 


where 6{x) denotes the discrete Dirac function. It should be 
noted that the co-occurrence matrix has a size of Ng x Ng, 
where Ng denotes the number of gray levels in the image. 
The offset (A^,, Ay) depends on the direction 9 which can be 
defined as: 

0 = tan~^{^) (5) 


Here we have derived the co-occurrence matrix for the offset 
{Aj;,Ay) — (1,0). In our work, the textural features are 
extracted on block level. Each image is divided into non¬ 
overlapping blocks of size N x N and their co-occurrence 
matrices are derived and 14 features are extracted from them. 
More details about the derivation of these 14 textural fea¬ 
tures from co-occurrence matrix is provided in the appendix. 
Then the features from different blocks are concatenated and 
formed a longer feature vector. If an image has a size of 
Si X S2, the total number of textural features will be: 


M = 


14siS2 

A2 


In our work the textural features are derived in a slightly 
different way from the original paper ll25l . but they are very 
similar. Here the co-occurrence matrix is found only for a 
single pixel horizontal shift (corresponding to 0 = 0). 

After derivation of the set of scattering and textural fea¬ 
tures, we can concatenate them to form the feature vector of 
each iris image as: f = where fs and ft denote the 

scattering and textural features respectively. 


2.3. Principal Component Analysis 

Principal component analysis (PCA), also known as Karhunen- 
Loeve transformation, is a powerful algorithm used for di¬ 
mensionality reduction ll26ll . Given a set of correlated vari¬ 
ables, PCA transforms them into another domain such the 
transformed variables are linearly uncorrelated. This set of 
linearly uncorrelated variables are called principal compo¬ 
nents. PCA is usually defined in a way that the first principal 
component has the largest possible variance and the second 
one has the second largest variance and so on. Therefore 
after applying PCA, we could only keep a subset of principal 
components with the largest variance to reduce the dimen¬ 
sionality. PCA can be thought of as fitting a fc-dimensional 
ellipsoid to a set of data, where each axis of the ellipsoid 
represents a principal component. There are also others di¬ 
mensionality reduction algorithms which are designed based 
on PCA such as kernel-PCA and sparse-PCA. PCA has many 
applications in computer vision. Eigenface is one represen¬ 
tative application of PCA in computer vision, where PCA is 
used for face recognition ll27l . 

Without going into too much detail, let us assume we have 
a dataset of N iris images and {fi, f 2 , ■■■, In} denote their 
features. Also let us assume that each feature has dimension¬ 
ality of d. To apply PCA, all features need to be centered first 





by removing their mean: Zi = - f where f = jf fi- 

Then the covariance matrix of the centered images is calcu¬ 
lated: 

N 

c = ^ z.zf (6) 

i=l 

Next the eigenvalues Xk and eigenvectors of the covariance 
matrix C are computed. Suppose A^’s are ordered based on 
their values. Then each Zi can be written as Zi = X]i=i 
We can reduce the dimensionality of the data by projecting 
them on the hrst d) principal vectors as: 

Zi = {zi,Z2,...,Zk) = {iy'[zi,V2Zi,...,iy'^Zi) = {ai,...,aK) 

By keeping k principal components, the percentage of re- 
tained variance can be found as: ' . Hence one simple 

way to choose k would be to pick a value such that the above 
ratio is less than e, where e is usually chosen between 95% to 
99%. 

3. RECOGNITION ALGORITHM: MINIMUM 
DISTANCE CLASSIEIER 


image are used as features, resulting in 782 scattering fea¬ 
tures. The scattering features are derived using the software 
implementation provided by Mallat’s group ll28l . 

To extract textural features, each image is divided into 12 
smaller blocks and 14 features are derived from any of them 
which results in a total of 168 textural features. Therefore 
the concatenated feature vector has a length of 950. Then 
PCA is applied to all features and the hrst 80 PCA features 
(retain above 99% of the initial features’ energy) are used for 
recognition. Minimum distance classiher is used for template 
matching. 

We have tested our algorithm on a popular iris database 
collected by IIT Delhi. This database contains 2240 iris im¬ 
ages captured from 224 different people. The images of 21 
people (around 10%) are used as a validation set to hnd the 
optimum value of the parameters of the algorithm, and the 
rest are used for evaluation. For each person, around half of 
the images are used for training and the rest for testing. 

Figure 4 shows the recognition accuracy using different 
numbers of PCA features. Interestingly, even by using few 
PCA features, we are able to get a very high accuracy rate. 
As it can be seen, using 80 PCA features results in a accuracy 
rate above 99%, which will not increase by using more PCA 
features. 


There are various classihers which can be used for this task, 
including majority voting algorithm, support vector machine, 
neural network and minimum distance classifier. In this work 
minimum distance classifier has been used which is quite pop¬ 
ular for template matching problems. One benefit of mini¬ 
mum distance classiher is that it does not need any training, 
making it much faster than most of the other classihers. As 
long as the features are discriminative enough to separate dif¬ 
ferent classes, the minimum distance classiher will provide 
high accuracy, otherwise using other classihers would be a 
better option. Minimum distance classiher hnds the distance 
between the features of the training samples and those of an 
unknown subject, and picks the training sample with the min¬ 
imum distance to the unknown as the answer. To put it in 
equation, if we show the features of the test subject as F* and 
those of the test sample i with the test subject is matched 

to the sample that satishes the following: 
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Fig. 4. Recognition accuracy as a function of number of PCA 
features 



i* = argmin \dis{F*, F^’''^)] (7) 

We have used Euclidean distance as our distance metric. 

4. EXPERIMENTAL RESULTS AND ANALYSIS 

This section presents a detailed description of experimental 
results. Before showing the results, let us describe the pa¬ 
rameter values of our algorithm. For each image, scattering 
transform is applied up to two levels with a set of filter banks 
with 5 scales and 6 orientations, resulting in 391 transformed 
images. The mean and variance of each scatter-transformed 


Table 1 provides a comparison of the performance of the 
proposed scheme and those of other recent algorithms. The 
accuracy of the proposed scheme is reported as the highest 
rate achieved by 80 PCA features. As it can be seen, using the 
combination of scattering and textural features, we are able to 
outperform previous approaches. This is mainly due to the 
richness of both scattering and Haralick features which are 
able to capture high-frequency patterns of irises, providing 
a very high discriminating power. One main advantage of 
this scheme is that, it does not require segmentation of iris 
from eye images (although the segmentation could improve 
the results for some difficult cases). 





Table 1. Comparison with other algorithms for iris recogni¬ 
tion 


Method 

Recognition 

rate 

Haar Wavelet ll?) 

96.6% 

Log Gabor Eilter by Kumar 0 

97.19% 

Eusion 0 

97.41% 

Elastic Graph Matching |l8l 

98% 

Proposed scheme using 80 PCA features 

99.2% 


the entropy of p{i,j) and; 


HXYl = - EE^( i,j) log {Px{i)PyU)) 
i j 

HXY2 = - '^'^Px{i)Py{j) log {px(i)Pv{j)) 


Q{hj) = E 


p{h k)p{j, k) 

Px{i)Py{k) 


The experiments are performed using MATLAB 2012 on 
a laptop with Core i5 CPU running at 2.6GHz. The execution 
time for the proposed scheme is about 11ms for each image 
which is fast enough to be used for real-time applications. 

5. CONCLUSION 

This paper proposed a set of scattering and textural features 
for iris recognition. The scattering features are extracted glob¬ 
ally, while the textural features are extracted locally. Scat¬ 
tering features are locally invariant and carry a great deal of 
high-frequency information which are lost in other descrip¬ 
tors such as SIFT and HOG. The high-frequency informa¬ 
tion provides great discriminating power for iris recognition. 
Principal component analysis is applied on features to reduce 
dimensionality. Then minimum distance classifier is used 
to match new iris images with training images. This algo¬ 
rithm is tested on a well-known dataset, and a high accuracy 
rate is achieved which outperforms the previous best results 
achieved on this dataset. In the future, we will investigate 
to apply the proposed set of features to more challenging iris 
datasets and also other biometric recognition problems. 
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Appendix. More details on Haralick tex¬ 
tural features 

To find textural features from the co-occurrence matrix, we 
first need to find the following terms which are used for 
derivation of Haralick features. p{i,j) = denotes 

the normalized co-occurrence matrix which can be thought 
as a probability distribution. Pxii) and Py{j) denote the 
marginal probabilities along x and y. px+y{k) and px-y{k) 
denote the probabilities of x + y and x — y. HXY denotes 


Using the above terms, the following 14 textural can be de¬ 
rived for each image; 


/i = EE 

i 3 

N,-l 

f2 ^ ^ ^ Px—y{^)'> 
k^O 

J3 — - 

(J^ (7y 

* 3 


Angular Second Moment 

Contrast 

Correlation 

Variance 


h 

h 


t j '' 

2Ng 

'^kpx+y{k), 


Inverse Diference Moment 


Sum Average 


k=2 

2Ng 

fr = '^{k - fe)'^Px+y{k), Sum Variance 

k^2 
2N, 

h = - ^Px+y(k)\og{px+y{k)), Sum Entropy 

k^2 

h = - E log(p(*, j)), Entropy 

i 3 
N,-l 

/to = (k — px-y)‘^Px-y{k), Diference Variance 

k=0 

N,-l 

fii=- y] Px-y{k)\ogpx-y{k), Difference Entropy 
fc=0 

_ HXY - HXYl 
“ max{HX, HY} 


/i 3 = \/l - exp[-2{HXY2 - HXY)] 
/i 4 = Second largest eigenvalue of Q 
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